Picture for Juntao Dai

Juntao Dai

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Add code
Jun 01, 2026
Viaarxiv icon

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

Add code
Jun 01, 2026
Viaarxiv icon

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

Add code
May 28, 2026
Viaarxiv icon

Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry

Add code
Mar 27, 2026
Viaarxiv icon

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

Add code
Mar 05, 2026
Viaarxiv icon

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Add code
Feb 18, 2026
Viaarxiv icon

What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning

Add code
Feb 09, 2026
Viaarxiv icon

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Add code
Jan 26, 2026
Viaarxiv icon

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

Add code
Dec 27, 2025
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon